automated pipeline
APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets
The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, improving its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models.
An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon
Jana, Abhishek, Uili, Moeumu, Atherton, James, O'Brien, Mark, Wood, Joe, Brickson, Leandra
This paper presents an automated one-shot bird call classification pipeline designed for rare species absent from large publicly available classifiers like BirdNET and Perch. While these models excel at detecting common birds with abundant training data, they lack options for species with only 1-3 known recordings-a critical limitation for conservationists monitoring the last remaining individuals of endangered birds. To address this, we leverage the embedding space of large bird classification networks and develop a classifier using cosine similarity, combined with filtering and denoising preprocessing techniques, to optimize detection with minimal training data. We evaluate various embedding spaces using clustering metrics and validate our approach in both a simulated scenario with Xeno-Canto recordings and a real-world test on the critically endangered tooth-billed pigeon (Didunculus strigirostris), which has no existing classifiers and only three confirmed recordings. The final model achieved 1.0 recall and 0.95 accuracy in detecting tooth-billed pigeon calls, making it practical for use in the field. This open-source system provides a practical tool for conservationists seeking to detect and monitor rare species on the brink of extinction.
- North America > United States > Kentucky (0.04)
- Oceania > Samoa > Gagaifomauga > Safotu (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (8 more...)
- Government (0.46)
- Law > Environmental Law (0.34)
CoRe: An Automated Pipeline for The Prediction of Liver Resection Complexity from Preoperative CT Scans
Ali, Omar, Bone, Alexandre, Accardo, Caterina, Belkouchi, Omar, Rohe, Marc-Michel, Vibert, Eric, Vignon-Clementel, Irene
Surgical resections are the most prevalent curative treatment for primary liver cancer. Tumors located in critical positions are known to complexify liver resections (LR). While experienced surgeons in specialized medical centers may have the necessary expertise to accurately anticipate LR complexity, and prepare accordingly, an objective method able to reproduce this behavior would have the potential to improve the standard routine of care, and avoid intra- and postoperative complications. In this article, we propose CoRe, an automated medical image processing pipeline for the prediction of postoperative LR complexity from preoperative CT scans, using imaging biomarkers. The CoRe pipeline first segments the liver, lesions, and vessels with two deep learning networks. The liver vasculature is then pruned based on a topological criterion to define the hepatic central zone (HCZ), a convex volume circumscribing the major liver vessels, from which a new imaging biomarker, BHCZ is derived. Additional biomarkers are extracted and leveraged to train and evaluate a LR complexity prediction model. An ablation study shows the HCZ-based biomarker as the central feature in predicting LR complexity. The best predictive model reaches an accuracy, F1, and AUC of 77.3, 75.4, and 84.1% respectively.
- Europe > France > Île-de-France (0.05)
- Asia > Singapore (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.89)
AutoCP: Automated Pipelines for Accurate Prediction Intervals
Zhang, Yao, Zame, William, van der Schaar, Mihaela
Successful application of machine learning models to real-world prediction problems, e.g. financial forecasting and personalized medicine, has proved to be challenging, because such settings require limiting and quantifying the uncertainty in the model predictions, i.e. providing valid and accurate prediction intervals. Conformal Prediction is a distribution-free approach to construct valid prediction intervals in finite samples. However, the prediction intervals constructed by Conformal Prediction are often (because of over-fitting, inappropriate measures of nonconformity, or other issues) overly conservative and hence inadequate for the application(s) at hand. This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP). Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate while optimizing the interval length to be accurate and less conservative. We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)